AITopics | discovery task

Collaborating Authors

discovery task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Training Details and Model

Neural Information Processing SystemsApr-25-2026, 16:03:12 GMT

We set the patch size to be 8. Our model is optimized by AdamW optimizer [3] with a learning rate2 of 0.0004, 250k training steps, linearly warm-up of 5000 steps and an exponentially weight-decaying3 schedule. The gradient norm is clipped at 1. We use Pytorch automatic mixed-precision and data4 paralleling for training acceleration. All models are trained on 4 Nvidia RTXA5000 GPUs with a5 total batch size of 128.

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

HiBug: On Human-Interpretable Model Debug Muxi Chen, Yu Li, Qiang Xu The Chinese University of Hong Kong

Neural Information Processing SystemsFeb-7-2026, 21:45:05 GMT

Experimental results demonstrate the efficacy of the HiBug framework.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.40)
North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Majumder, Bodhisattwa Prasad, Surana, Harshit, Agarwal, Dhruv, Mishra, Bhavana Dalvi, Meena, Abhijeetsingh, Prakhar, Aryan, Vora, Tirth, Khot, Tushar, Sabharwal, Ashish, Clark, Peter

arXiv.org Artificial IntelligenceJul-1-2024

Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systematically assess current model capabilities in discovery tasks and provide a useful resource for improving them. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering, by manually deriving discovery workflows from published papers to approximate the real-world challenges faced by researchers, where each task is defined by a dataset, its metadata, and a discovery goal in natural language. We additionally provide 903 synthetic tasks to conduct controlled evaluations across task complexity. Furthermore, our structured formalism of data-driven discovery enables a facet-based evaluation that provides useful insights into different failure modes. We evaluate several popular LLM-based reasoning frameworks using both open and closed LLMs as baselines on DiscoveryBench and find that even the best system scores only 25%. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.

dataset, hypothesis, workflow, (17 more...)

arXiv.org Artificial Intelligence

2407.01725

Country:

Europe > Spain > Catalonia (0.04)
Asia (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(4 more...)

Genre:

Workflow (0.71)
Research Report (0.64)
Overview (0.46)

Industry:

Information Technology > Services (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

LLM2FEA: Discover Novel Designs with Generative Evolutionary Multitasking

Wong, Melvin, Liu, Jiao, Rios, Thiago, Menzel, Stefan, Ong, Yew Soon

arXiv.org Artificial IntelligenceJun-21-2024

The rapid research and development of generative artificial intelligence has enabled the generation of high-quality images, text, and 3D models from text prompts. This advancement impels an inquiry into whether these models can be leveraged to create digital artifacts for both creative and engineering applications. Drawing on innovative designs from other domains may be one answer to this question, much like the historical practice of ``bionics", where humans have sought inspiration from nature's exemplary designs. This raises the intriguing possibility of using generative models to simultaneously tackle design tasks across multiple domains, facilitating cross-domain learning and resulting in a series of innovative design solutions. In this paper, we propose LLM2FEA as the first attempt to discover novel designs in generative models by transferring knowledge across multiple domains. By utilizing a multi-factorial evolutionary algorithm (MFEA) to drive a large language model, LLM2FEA integrates knowledge from various fields to generate prompts that guide the generative model in discovering novel and practical objects. Experimental results in the context of 3D aerodynamic design verify the discovery capabilities of the proposed LLM2FEA. The designs generated by LLM2FEA not only satisfy practicality requirements to a certain degree but also feature novel and aesthetically pleasing shapes, demonstrating the potential applications of LLM2FEA in discovery tasks.

algorithm, discovery task, optimization, (14 more...)

arXiv.org Artificial Intelligence

2406.14917

Country:

Asia > Singapore (0.04)
Europe > Germany (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Add feedback

CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems

Feng, Yanlin, Rahman, Sajjadur, Feng, Aaron, Chen, Vincent, Kandogan, Eser

arXiv.org Artificial IntelligenceJun-1-2024

Compound AI systems (CASs) that employ LLMs as agents to accomplish knowledge-intensive tasks via interactions with tools and data retrievers have garnered significant interest within database and AI communities. While these systems have the potential to supplement typical analysis workflows of data analysts in enterprise data platforms, unfortunately, CASs are subject to the same data discovery challenges that analysts have encountered over the years -- silos of multimodal data sources, created across teams and departments within an organization, make it difficult to identify appropriate data sources for accomplishing the task at hand. Existing data discovery benchmarks do not model such multimodality and multiplicity of data sources. Moreover, benchmarks of CASs prioritize only evaluating end-to-end task performance. To catalyze research on evaluating the data discovery performance of multimodal data retrievers in CASs within a real-world setting, we propose CMDBench, a benchmark modeling the complexity of enterprise data platforms. We adapt existing datasets and benchmarks in open-domain -- from question answering and complex reasoning tasks to natural language querying over structured data -- to evaluate coarse- and fine-grained data discovery and task execution performance. Our experiments reveal the impact of data retriever design on downstream task performance -- a 46% drop in task accuracy on average -- across various modalities, data sources, and task difficulty. The results indicate the need to develop optimization strategies to identify appropriate LLM agents and retrievers for efficient execution of CASs over enterprise data.

benchmark, graph, modality, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3665601.3669846

2406.00583

Country:

Asia > Japan (0.04)
North America > United States > Utah (0.04)
North America > United States > Minnesota (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Basketball (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Neural scaling laws for phenotypic drug discovery

Linsley, Drew, Griffin, John, Brown, Jason Parker, Roose, Adam N, Frank, Michael, Linsley, Peter, Finkbeiner, Steven, Linsley, Jeremy

arXiv.org Artificial IntelligenceSep-28-2023

Recent breakthroughs by deep neural networks (DNNs) in natural language processing (NLP) and computer vision have been driven by a scale-up of models and data rather than the discovery of novel computing paradigms. Here, we investigate if scale can have a similar impact for models designed to aid small molecule drug discovery. We address this question through a large-scale and systematic analysis of how DNN size, data diet, and learning routines interact to impact accuracy on our Phenotypic Chemistry Arena (Pheno-CA) benchmark: a diverse set of drug development tasks posed on image-based high content screening data. Surprisingly, we find that DNNs explicitly supervised to solve tasks in the Pheno-CA do not continuously improve as their data and model size is scaled-up. To address this issue, we introduce a novel precursor task, the Inverse Biological Process (IBP), which is designed to resemble the causal objective functions that have proven successful for NLP. We indeed find that DNNs first trained with IBP then probed for performance on the Pheno-CA significantly outperform task-supervised DNNs. More importantly, the performance of these IBP-trained DNNs monotonically improves with data and model scale. Our findings reveal that the DNN ingredients needed to accurately solve small molecule drug development tasks are already in our hands, and project how much more experimental data is needed to achieve any desired level of improvement. We release our Pheno-CA benchmark and code to encourage further study of neural scaling laws for small molecule drug discovery.

deconvolution, dnn, molecule, (17 more...)

arXiv.org Artificial Intelligence

2309.16773

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > Montserrat (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cross Modal Data Discovery over Structured and Unstructured Data Lakes

Eltabakh, Mohamed Y., Kunjir, Mayuresh, Elmagarmid, Ahmed, Ahmad, Mohammad Shahmeer

arXiv.org Artificial IntelligenceJul-16-2023

Organizations are collecting increasingly large amounts of data for data driven decision making. These data are often dumped into a centralized repository, e.g., a data lake, consisting of thousands of structured and unstructured datasets. Perversely, such mixture of datasets makes the problem of discovering elements (e.g., tables or documents) that are relevant to a user's query or an analytical task very challenging. Despite the recent efforts in data discovery, the problem remains widely open especially in the two fronts of (1) discovering relationships and relatedness across structured and unstructured datasets where existing techniques suffer from either scalability, being customized for a specific problem type (e.g., entity matching or data integration), or demolishing the structural properties on its way, and (2) developing a holistic system for integrating various similarity measurements and sketches in an effective way to boost the discovery accuracy. In this paper, we propose a new data discovery system, named CMDL, for addressing these two limitations. CMDL supports the data discovery process over both structured and unstructured data while retaining the structural properties of tables.

data mining, discovery, machine learning, (27 more...)

arXiv.org Artificial Intelligence

2306.00932

Country:

Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
Asia > Middle East > Qatar (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology > Services (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.46)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

LakeBench: Benchmarks for Data Discovery over Data Lakes

Srinivas, Kavitha, Dolby, Julian, Abdelaziz, Ibrahim, Hassanzadeh, Oktie, Kokel, Harsha, Khatiwada, Aamod, Pedapati, Tejaswini, Chaudhury, Subhajit, Samulowitz, Horst

arXiv.org Artificial IntelligenceJul-9-2023

Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes.

benchmark, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2307.04217

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Slovakia (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
(11 more...)

Genre: Research Report (0.84)

Industry:

Government > Regional Government > Europe Government (0.34)
Banking & Finance > Economy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.83)

Add feedback

Word Discovery in Visually Grounded, Self-Supervised Speech Models

Peng, Puyuan, Harwath, David

arXiv.org Artificial IntelligenceJun-19-2023

Our powerful word segmentation and clustering capability emerges method is simple: it simply involves applying a binary threshold within the model's self-attention heads. Our experiments reveal to the self-attention maps produced by the model, and extracting that this ability is not present to nearly the same extent in contiguous temporal regions of the speech signal with the base HuBERT and wav2vec2.0

attention segment, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2203.15081

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (1.00)

Technology: